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NORM-REFERENCED STANDARDIZED MATHEMATICS ACHIEVEMENT 
TESTS AT THE SECONDARV SCHOOL LEVEL AND THEIR 
RELATIONSHIP TO THE NATIONAI ASSESSME^•T 
OF EDUCATIONAL PROGRESS CONTENT 
OBJECTIVFG AND SUBOBJECTIVES 

Stendardlzid test publlshars claim they use different sources for 
the contents of their examinations. TeKlbooks, curriculum guides and the 
opinions of leading educators ere usuolly mentioned In the manutsis end 
handbooks which accompany the tests, One potential source Is absent from 
the references accompanying the current versions of five leading 
standardized test series, the National Assessment of 
Educational Progress findings. 

The reasons underlying the publishers' choice of sources Is not an 
issue In this research. Clearly, the scope and the IntenHly of the NAEP's 
efforts posed against the publishers* choices form u dilemma which other 
reseerchers may elect to examine. This research will study the extent to 
which current norm-referenced standardized achievement tests reflect 
the curriculum espoused by the NAEP In one subject, 
secondary mathematics. 

Research designed to show If standardized tests verlid over time 
in terms of their content revealed that few changes had taken place. ' In 
this research, which was limited to elementary mathematics, the list of 
objectives prtpered by the NAEP for lt& initial mathimetics assessment in 
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1972-73 served as the crllirlon. Further analyses showed that a 
relatively small percent of the N^EP objectives end their components or 
subobjectivos mrB assessed by the standardized tests. 

This finding must be qualified because the list of objectives and 
their components may not be suitable for ell students in America despite 
educators' assertions that all of the topi s mey be part of an elementary 
school mathematics program. Secondary school mathematics programs 
have mX been studied under the NAEP criterion and the present study was 
conUucted in order to ditermine the degree to which current 
norm-referenced standardiEid achievement tists attend to the NAEP 
objectives and subobjectives. Reliability end validity for the procedurts 
had been established earlier and will bt described in detail 
later in this paper. 

Five standardized test series were used in the study, the Stanford 
MhllVfmgnt Tgft . the Metropolitan Achievimtnt Tflsts the California 
hmswmUMm, the comprehensive T ests of Basic Skiiig and the SRA 
ACtllgvement Sgrty- The researcher examined each item In each test and 
assigned it to one of the NAEP subobjectives. Chi-square was used to 
analyze the data 
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..f I - ment of Educational Prognss (NAEP) Is a 
cyclical test nlch tests in ten instructional areas are 

elternatecl §r- nations in music, reading, writing, art, 

citizenship, m ' e .s, science, social studtis, literature and 
occupationQl development ore admintsttrid to corefully silected samples 
at four age levels, nine, tliirtetn, seventeen, and twenty-si^ to thirty-five. 
Results are presented by age, sex, geographic region, race, community type 
and parental education status. 

Many benefits have been derived from NAiP, not the least 
of which is the refinement of methodologies for implementing 
large-scale exercise development and data coMictlon ectivitles. 
It is hopi3d that the data have Influenced school administrators end 
federal and state iegislators to make rational decisions about the 
allocations of money for educational programs.* 

The NAEP Is a project of the fducetlon Commission of the States. 
Designed to determine the nation's progress In education, this project is 
funded by the National Center for Education Statistics. This assignment 
was given to the Education Commission of the States by the U. S. Office of 
Education at its inception in 1667.^ 

Prior to the NAEP's work in assissment, measures of educational 
quality were based on categorical or demographic Information. This 
Information Included teacher-student ratios, class slit, number of 
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Classrooms, and per-pupll expenditures among other data. Meaningful 
outcome measure were not available. Formal testing programs produced 
Information which could be used to categorize students but this data did 
not yield information on Individual student learning.^ 

Or. Francis Keppel, the United States Commissioner of Education 
from 1962 to 1965, became concerntd about this dearth of Information and 
Initiated a series of confirences designed to find vi-ays to collect date on 
the nation's progress in education. Keppers approach bore fruit in 1964 
when a group of educators formed the Exploratory Committee on Assessing 
the Progress of Education (ECAPE). Ralph W. Tyler chaired ECAPE and 
directed the committee toward dettrminlng the feasibility of conducting a 
national asstssmint. ECAPE reported that this project was feasible and 
the responsibility for conducting the assessmtnt was given to the 
Education Commission of the States and named the National Assessmtnt of 
Educatioi^al Progress In 1968. Funding was supplied by the National Center 
for Education Statistics.^ 

More than one hundred years passed between the govimment's 
assignment to collect data on the status of tducatlon In the United States 
In 1867 and the Initiation of the program designed to carry out this task in 
1968, Since Its start, the NAEP has assessed more Americans In more 
areas than any olher ftuerony sponsored testing program In the nation's 
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history. The 1972-73 nfiothematlcs assessment, for Instance, included 
data from as^OOO nine-year olds, 30,000 thirteen year-elds, 33,000 
seventeen-year olds and 4,500 young adults.^ 

The NAEP has been directed by leading American educators^ end 
has conducted three assessments in mathematics, 1972-73, 1977-78 and 
1962-83, While other systems for categorizing mathematics content hove 
been prepared, none have included as many categories as the system 
constructed for the first assessment.^ Becaust of this comprehenslva 
structure, the domain definid and used by the NAEP for its first 
mathematics assessment was iHamined and used in this study. 

A carefully planned series of activities underscorid the 
development of the content catigories used in the NAEP's first 
assessment of mathematics.^ The first task faced by the NAEP was to 
define the universe for each skill assessed while ascertaining that each 
task was properly defined. According to Wilson, defining th« universe 
Included the knowlidges, skills and attitudes related to the subject while 
excluding those which were not. On the other hand, a list of this type 
would be far too long for meaningful assessment purposes and should be 
reduced In length by using only relevant items in the process. 

Wilson was not satisfied with the construct of universe as applied 
to the NAEP's purposes for the first assissmenl of mathematics. 
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It is cleoHy beyond the current ttote of the ort to define the 

universe of behaviors for e compltx oree in tht strict senst dlscussid 
above. Vit, It Is equally clear that a set of exercises (test Ittms) 
which form a coherent essissment of a subject area cannot be 
constructed without some definition of the domain to be tested.'^ 

The NAEP took a Judgmental approach to this dilemma by relying 

on Individuals' opinions as opposed to logic or stallsUcs. Thus, NAEP's 

universe was defined by 

„. a set of objectives that reprisenls a consensus of opinion 
covering many segments of our society regarding the Important 
goals and outcomes of our educational proctsses In respect to 
a given subject area.'' 

The NAEP divided mathematics content Into seventeen 
Instructional content objectives. Fifteen content objectives Included 
subdivisions wmch were celled content subobjectlves for this study. 
Overall, 126 content subobjectlves were slated. Enerclses were prepared 
to measure the objectives at each of the appropriate age levels set 
by the NAEP. '2 

At first, the NAEP objectives and subobjectlves were prepared by 
subcontractors. These objectives, according to Wilson, differed in quality 
and tended to assess only those areas most amenable to measurement. 
Topics which were not amenable to measurement, but no less Important to 
educators, were not examined. Later, the development of the 
objectives and subobjectlves was assigned to the NAEP's Exercise 
Development Department. 

8 
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Comments on the NAEP's work heve verled. The NAEP has been 
criticized by Womer end Mastle who questioned the use and application of 
the risults of the assessments'^ and Kotzman and Rosen who inferred 
that the research design followed by the NAEP was influenctd by political 
considerations, '5 pn the other hand, Gretnbaum et ol. pointed out that the 
NAEP has taken steps to answer the questions posed by Its critics.'^ 
Payne claimed that many benefits resulted from the NAEP's work including 
strategies for constructing exerclsis and collecting data.'^ 

Additionally, a number of educators reported their use of NAEP 
data and the assistance provided by this information. McKllllp used NAEP 
data to recommend Improvements In teaching division. carpenter 
claimed that NAIP date offered insights to areas of performance 
differences in calculations with decimals for thirteen-year old studints 
who had been given Instruction in the skill as opposed to nine-year old 
students who had not been given Instruction.'® Post was able to suggest 
techniques designed to help students improve their skills In adding 
fractions 20 while Kahle used NAEP data to suggest approaches for 
Increasing minority studsnt enrollment In science 2' Lapolnte end Koffler 
staled that NAEP data can help educators develop national 
educational standerds.22 



9 



Tha evidence shows that the NAEP has constructed a 
comprehensive, orderly system for categorizing mathematics content In 
order to assess individual performance. This system Is composed of 
seventeen content objectives which apply to four age groupings. Fifteen 
objectives Include subobjectlves. 

Exercises have been prepared for each content objective and 
subobjectlve. These txerclses vary as a function of the age grouping for 
which they are designed. Thus, the NAEP has set up a system wfUch 
categorizes mathematics content through a series of objectivis and 
assesses performance through the achievement of these objectives. 

The NAEP reviews the objectives continually and revises thtm as a 
result of this review. These modifications are Implemented end each 
mathematics assessment has differed from the previous one because of 
the NAEP's concerns about its evaluation procedures. 

The NAEP attends to tho mathematics domain. Standardized 
mathematics achievement tests should do do as well. Therefore, the 
content objectives and subobjectlves constructed by the NAEP could be 
used to categorize standardized test content. Other systems ere 
evailabnle but we found none which were as comprehensive as that 
prepared by the NAEP. 
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Analuzino Standardised Tast Content 

Invssllgelors havt commentid on slandardlzed achlevemsnt ttsts 
since the first edition of the Stanford Ach ievement Test was published in 
1922.23 Thest commtnts have ranged from simple descriptions^^ to 
detailed statistical analyses of the instruments 25 m first, researchers 
restricted themselves to presenting cotegoricol information about the 
tests they examined. Later, the researchers gave their opinions of the 
quality of the tests thay studied as well. 

Educational. PsucholoalCQl and Persnnftlit u Tests of 1933 and ^md 
(EPT) was the first reference work encountered In the course of this 
review which provided information on the quality of the tests examined as 
well as catagoricnl information on them.^S For EPT, professors of 
education and testing specialists commented on the characteristics of the 
tests they reviewed. This policy has cominued through the Mental 
Measurements Vearbook series. The Ninth Mental Measurements Vearfaonk 
for instance, contained references to 1,409 tests^^ with 660 educators 
contributing reviews.28 

Test reviews of the type found in the Mental Measurements 
Vearbook series also appear in the lltirature at large. The Journal of 
Educational Measurement for example, publishes test reviews 
continuously: Other Journals do so as well. Education index, a reference 
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work which cotelogi articles In ptrlodlcols dealing with education, listed 
1,174 citations over 23 pages for Its Tests and Scales category in the 
thirty-fifth volume of this series. This volume covered the twelve month 
period Which began in July, 1984.29 clearly, educational me«suriment and 
the analysis of Instruments used In assessment make up a meaningful 
portion of the educational literature. 

While a considerable amount of Information on assessment 
Instruments has appeared in the educational literature, little attention 
has been given to comprehensive analyses of current tests in a single 
subject. Researchers who have worked in testing have devoted their 
attention to other concerns. 

Robert Floden et el. found that the content covered by four 
standardized mathematics tests designed for use with fourth grade 
students varied with the differences among them leedlng to possible 
consequences In terms of Instruction.^^ Floden et el. used the Stanford 

mmmn\ mih m imaJsmMmiLmM (iTBS),the 

Mltrpponten Achievement T^^^.f (MAT) and the Comprehen^iyfl T ^^tf fTf 
PSfle SMIl? CCTBS) In their study. The contents of these tests were 
compared to the topics a teacher might cover In his classroom. For 
operations, one component of the system used by the Investigators, test 
content was similar In terms of the percentage of items devoted to 
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certain specific tosks In addition, uublracllon and division. However, 

dlffirences were observed for the pBrcentage of test Items assigned to 

the topic of addition, overall. In the MAT, twenty-one percent of the Itenns 

were assigned to addition while the other tests devoted between twelve 

and fourteen percent of their Items to this topic. 

More similarities than differences were found among the tests 

examined, but the differences were important according to the 

researchers. Six percent of the CTBS Items dealt with percentage while 

the other tests contained no items to test this skin. Alternative number 

systems were examined in the MAT and SAT but not In the other tests. 

A school district that emphasizes work with percentages In fourth 
grade would get a dlstortid picture of progress from the Iowa, which 
contains no percentage problems. On the other hand, a district which 
does not Introduce pircents until the sixth grade would be 
unnecessarily discouraged by the results of the CTSS which contains 
six percent problems (sic) involving percentages.'' 

Floden et al. did not Identify the itvels of the achievement tests 

they used In their study. Some publishers use grade overlaps at terminal 

grades for their achievement tests. Thus norms for a fourth grade student 

appear In Level 2 and Level 3 of the 1970 CAT. The writers should have 

specified the levels of the achievement tests they used in their study. 

Similarly, the level of the lest used differs with the time the test Is 

administered. A fourth grade student who Is tested In the fall of the 
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school yeor would take the Primary Hi Battery of the 1973 edition of the 
SAT, but If hi Is tested 1h the spring, he would take the Intermediate I 
version. (The SAT does not overlap terminal test grades.) Had the 
Investigators provided this Information, readers would be able to make 
more appropriate judgments. This shortcoming limits the study's findings. 

Bonnie Armbruster, Robert Stevens end Barak Rosenshlne looked at 
the coverage of three curricula by two tests.^^ jhe researchers wanted 
to Identify similarities and differences among the Instruments as well as 
subject emphases. All of the materials were designed to assess the 
performance of third gradt students. The reading series used In the study 
differed in accordance with their emphasis on reading comprehension, 
generally, and certain categories subsumed by this sklll.^^ 

The standardized tests were similar with regard to their emphasis 
on reading comprehension, but all of the tests differed from the reading 
series used In the study In this respect. The researchers claimed that this 
finding showed that tests and texts differed In their content covirage.'"^ 
Moreover, a ler^e percentage of the comprehension Items taught were not 
assessed by the standardized tests. Of slxtien categories constructed for 
the study, no more than seven were taken up by the tests. Most of the test 
items focused on detail and paraphrasing while Inference was emphasized 
in the texts. 

14 
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Although there was a strong discrepancy between teaching and 
testing, the standardized teals used In the study were t milar fn terms of 
the topics enamlned. 

Oetplti unanswired queations, the present study is important in 
Its dimonitration of a feasible methodology for eddrissing a 
long-niglfctid research problem — determining content coverage 
and content emphasis of both curricula end tests. More such studies 
comparing curricula and tests in different content artos and grade 
levels ere needed.^^ 

Worthy wanted to detirrnlne if Ihiri wtre significcnt differencBs 

in the reading skills measured on two standardized tests, the CTBS and the 

The researcher did find the hypolheslzed differences. 

Each of the six null hypotheses stating that there was no 
significant difference between the reading 8l<iil8 emphasized at the 
third grade level in the three basal reading series and the skills 
measured on ths two standardized achievement tests was rejected^ 

Freeman et al. questioned the use of standardized test scores for 
Instructional purposes. 

Specifically, teachers are encouraged to use standardized test 
scores to evaluate student achievement on both a group and 
Indlvlduel level, to identify students with liamlng problems, and to 
assess the effectlventss of Instructional strategies that 
have been used.*® 

The use of standardized test scores for any of these functions, 

however, must be tempered by the teacher's knowledge of the extent to 

which the content of the test parallels the content of instruction. 

Differences in textbooks, school objectives and teacher behaviors as well 
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OS the contents of standordlzad tests moy contribute to o dIscontlnuUy 
between content and Instruction. This lock of consistency will generate 
scores which will underestlnrjate student achievement. 

The investigators used four standardized elenrientary mathemotlc 
achievement tests In their study, the 1973 SAT, the 1970 MAT, the 1976 
CTBS end the 1971 ITBS. Thtn, Frteman et al. constructed a teKonomy 
which was made up of three components, (1) presentation mode, (2) 
material end (3) operations. Through this strategy, the Investigators 
uncovered some differences In test content. For students enrolled 
In fourth grade, for instance, slnty-three percent of the SAT test 
Items involved whole numbers while the MAT assigned 
fifty-three perctnt of Its Ittms to this area, and the ITBS, forty-five 
percent. Other differences were cited but they were not as meaningful as 
the content assignments. 

Freeman et el. concluded that standardized mathematics tests 
differ In their content. Therefore, the match between the subjects taught 
end essessed will differ if en ineppropriote standardized test is selected. 
The curriculum mey be changed in eccordence with the test but this 
procedure may not be in the students' best interisls. 



16 



15 

The match betwiin conttnt taughl and contenl tested Is a crucial 
context for using ttsts to diagnose student strengths and weaknesses 
as well as for assessing the strengths and weaknesses of 
Instruction provlded.^^ 

Educators who decide to use standardized mathematics 
achievement tests for student assessment may be Interested In 
determining the entent to which their objectives have been achieved. 
Some tests may be more sensitive to certain objectives than others. 
Moreover, a test, to some extent, must be sensitive to the objective of 
assessment or It will not serve as a sound Indicator for that purpose. 
Researchers have made attempts to determine how well o test complitis 
its purpose, but these attempts have included small numbers of tests. 
This study will analyze the contents of five current standardized 
secondary mathematics achievement tests in an attempt to provide a 
comprehensive data base for tducetors Interfstid In selecting appropriate 
norm-referenced standardized achievement tests in order to assess their 
students' performance. 

Procedures 

At the time of the study's inception. The Ninth Mental 
hf ggMrtmgntl Ygsitook^^ was the most comprehensive listing of 
commerclflny^prepared standardlzid achievement tests in the United 
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States. Therefore, this reference work is en Important source for 
educators working in any area which calls for assessment. 

Reviews of The Ninth Ment al Measurements Yearbook had not 
reached the literature when this study began because of Its recent 
publication. However, reviews of past editions were positive, attesting to 
the Mental Measurements series' value to educators. Reths recommended 
The StVgrth hintsl Measurements VBarbook^^ to. "enyone intifisted in 
research end/or eveluation."^^ ^^^^^ g^gg commented on 0. K. Buros' 
outstanding contribution to the profession.^^ Proger called attention to 
the enormous ef fort made by the "father of test reviews/^^ while 
Englehard praised Buros by reporting: 

It is difficult to find unused superlatives to characterize The 
Seventh Mental Measurement s yearbook and its predecessors. They 
ere indeed weighty - 36 pounds on my bathroom scale. I can't believe 
I read the whole thing, but I have read enough to conclude by saying 
"Oscar, you are incredible 1*^' 

Wilson celled the Eighth Mental MeasurBmants VBarhnntf -q work of 
Immense proportion"^® and a comprehensive source of information 
bicause the editor cited the strengths and weaknesses of each test listed 
in the test profiles. Through this approach, decisions regarding test 
selection and use are left to the reader. 

Adams described Buros as a critic who looked for hontst, 
objective appraisals from his reviewers who were asked to criticise poor 
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work, call attention to good work end nnake tuggestlons for improviments. 
"Test entries for the 1,1 84 tests in the Eighth MMV ore complete, eccurote 
end helpful."^^ Thompson asked thet someoni contlnua Buros' work so that 
those who deal with tests will not have to rely on a trial and error 
approach to evaluate them in the future.^^ 

VQltdltyi Since no studies were incounttred during the course of 
the literature review which establlshid the validity and reliability of the 
NAEP classlflcetlons for analyzing content, this determination became the 
first step In the study at hand. "Although evidence may be accumulated in 
many ways, validity always refers to the degree to which that evidence 
supports the Inferences that are made from the scores. The Inferences 
regarding specific uses ere velldated, not the test Itself."^^ In this 
context, questions dealing with validity are directed toward Inferences 
about the subject of the assessment and inferences about other behaviors, 
to answer the first question, the researcher must take steps to determine 
how well the Instrument samples the domain of the topic measured. For 
the second question, the researcher must assess the value of the 
Instrument as a predictor of other behaviors. 

Three types of evidence may be used to describe assessment 
instruments with regard to validity; criterion-related evidence, 
content-related evidence and conttnjct-related evidence.^^ The seme 
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tnform§tton may be used for eoch form of validity. The approach imployed, 
howivtr, differs according to tht type. 

Chterion-relatid tvidtnce "demonslrales that test scores are 
systematically related to one or more outcome criteria."^ ' Here the 
criterion is the varloble of primary Interest to the researcher. Naturally, 
the choice of the criterion and the means used to examine It are crucial 
matters, Researchers may use two strategies to colleot evidence for 
establishing criterion-related validity. For predictive work, the 
researcher seeks Information designed to estimate criterion scores which 
will emerge at some time in the future. For concurrent work, both sets of 
information are collectid simultaneously. The choice of strategies 
depends upon the researcher's concerns. In a general sense, the difference 
between the two forms of evidence is based on time. 

Content-related evidence Is used to determine if a sample of 
behaviors represent those of the domain under study. Therefore, the 
researcher must ascertain if the Items included in the instrument are 
similar to those making up the domain. Since content-related evidence is 
a major concern during Instrument development procedures, professional 
judgment takes on a key role In terms of deciding what will be measured 
by the Instrument. 
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Construcl-related evidenci looks at the test score as a miasure of 
the psychological characteristic under study and nney be implied as the 
researcher examines a criterion for the construct. In turn, a construct 
may be defined as something which cannot be observed but Is stated by the 
invistlgator to summarise regularities in a person's behavlor.^^ p^, q 
construct is an Idea prepared by the Investigator to explain and organize 
en aspect of existing knowlidge, "... because construct validation refers to 
a broader and more abstract kind of bihavloral description, and because 
there is no single acceptable criterion miasuri against which to valldati a 
measure of a construct, construct validity typically requirts the gradual 
accumulation of evidence from a number of sourcis.''53 j^e accumulation 
of tvldence from various sources was cited by Sax who listed six steps In 
the construct validation proctss, (1) justifying the construct in terms of 
its educational end psychological properties, (2) stinguishing the 
construct examined from similar constructs, (3) meesurebility, (4) 
equlring evidence from different sources, (5) demonstreting that the 
construct does not correlate highly with irrelevent variables end 
(6) modifying the construct In accordance with the evidence gathered. 

Feet validity was not dIscussBd In the 1985 edition of standards 
but was attended to in the 1974 version. It Is mentlonid here because of 
Us classicel velue and for completeness. Face validity is the appearence 
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of validity , has no valut for diveloplng Inferences from scores, end la the 
"reasor.ebleness end acceptabimy of a test for use with a particular 
group."55 vy^iig ^Qgg ygi^^ny ygg^y^ .^^^ instances, it cannot 
be used as a substitute for the other forms of Bvidenct because Judgments 
of face validity cannet be used to develop conclusions as to how faithfully 
a score represents the topic in question or to predict behavior, 

IlJMllJly: Reliability is, "the degree to which tilt scores are 
free from errors of measurement." 5^ A respondent's effort, ease, and 
fatigue among other chiracterlstics may vary from one test 
administration to anothir. Consiquintly, scores will differ 
from one test to another. 

At least two sets of measurements are necessary for estimating 
reliability. These measures may be obteined by edministering the same 
Instrument to o subject twice or giving either alternati or parallel forms 
of the instrument to different subjects beliived to have no biases which 
would affect their responses. The first approach does not control for 
memory end accepts this factor as a potential systematic source of 
verienci. The second approach does not control for item inequivalence and 
accepts this factor as a potential source of systematic variance. 

The reliability and validity of the NAEP objectives and 
subobjictivet for use in this study were dttermtned by forming a panel of 

22 



referees who were asked to comment on the use of the objectives and 
subobjtctives for this purpose. The referees selected for this component 
of the study were employees of the School District of Philadelphia. Each 
referee had at least five years of classroom teaching eHperlence in the 
Philadelphia public schools, held an advanced degree in mathematics 
education, had taken at least one course in tests end measurements or 
statistics and was serving as a principal, mathematics supervisor or 
mathematics coordinator when the study was conducted. 

Each referee was asked to categorize the items from three sample 
tests. The reseorcher joined this component of the study by categorizing 
the items twice. The researcher's categorizations were separated by ten 
days in order to help control for memory.^^ This procedure was designed 
to establish interrater and intrarater reliability. In both instances, 
acceptable reliability coefficients emerged from the analysis. 

item Classification: The researcher acquired the tests used in the 
study from their publishers and prepared a telly sheet which included 
spaces for the name of the test ixamined, the date of its publication, the 
test's authors, the form eKemined, the level examined, and the grades for 
which the test was designed. The telly sheet accommodated sixty items, 
with multiple sheets used when a test's item count exceeded this figure. 
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Eoch Hem in each test was examined and assigned to an NAEP 
subobjecKve. The grade or grade cluster covered by each test was used to 
group the tests for analysis. Tests which were designed for use with 
students enrolled in nfiore than one grade were assigned to each eligible 
group. Thus, a test designed to measure student performance In grades 
nine and ten was analyEBd In both groups. For study purposes, tests were 
treated as if they were administered at the end of the school year. Each 
group of tests was analyzed separately for the seventeen content 
objectlyes and the 126 content subobjectlves. Thus, eight analyses were 
planned, one for each grade on the objectives and one, 
for the subobjectlves. 

StOttltlgQl PrOCfdtirgg; a chl-square one-sample test was used 
to determine If the number of content objectives examined In each test 
used In the study differed slgniricantly.SS The iame strategy was used to 
analyze the subobjectlves. The Statistical P^^Kftflft ff^ r the Socini 
mMnm (SPSS**) contains a program designed to perform a single-sample 
chl-square end It was used to analyze the data 5^ 

Results 

Table l shows the tests used in the study and the grades they 
covered. Eleven tests were used end seven were assigned to each grade 
grouping. Elsvin tests oppior, reprisenllng five series. Three of the 
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eleven tests Included norms for students tn grides nine through twelve, 
addressing ell of the grades used In the study. While three tests could be 
used to assess ninth grade student perfornfjance alone, there were no tests 
which contained norms for eleventh grade or twelfth grade 
students Individually. 

Since the tests designed to measurt student performance In 
eleventh and twelfth grades were the same, only one analysis was 
necessary to cover both. Thus, the number of analyses used In the study 
was reduced from eight to sIh. Tables 2, 3, and 4 show the relevant 
Information for the study, Each table presents a list of the tests involved 
and the number of objectives and subobjectlvss addressed by each. Six 
one-sample chl-square analyses were conducted and significance was not 
reached in any analysis. Therefore, the standardized tests examined in 
this study did not differ In terms of the numbers of NAEP objectlvts and 
subobjectives examined. Consequently, it would be appealing to say that 
any lest used In the study for student assessment would yield the same 
Information. This type of statement would be simplistic because the tests 
differ in the proportion and numbers of items directed toward individual 
objeclives and subobjectives. Tables 5 through 15 show the objectives 
end subobjectives addressed by each test examined in the study. 
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The tests exemined in the study did not dlfftr sf gnificontly in 
terms of the numbtr of NAEP objectives or subobjectlves addressed on a 
grade by grade basis. Consequently, It seems as if similar Information 
pertaining to student assessment would be provided by each test If number 
alone is used as the crltirlon 

Aithough fourlien of siventeen (82 %) NAEP objectives wen 
studied by all of the tests combined, no test addressid more then eleven 
(65 %). Properties of Numbers, Mathimatlcal Proof end Attitude and 
Interest Items did not appear in any test. Similarly, sixty of the 126 
subobjectlves (48 %) were examined with no test dealing with more than 
twenty-five (20 Ml This Information shows that the NAEP system is not 
being followed by the publishers of standardized norm-referenced 
mathematics tests In the secondary school grades. 

Some iubobjectlves may be approprialt to the elemtntary school 
grades only and their absence in the secondary grade tests may be 
legitimate. Slven this point, it still remains clear that test publishers are 
not using the NAEP system in preparing their tests. With recent rtstarch 
demonstrating the relatively low status of American students among 
their peers in other countrlf s. It may be time for test publishers to 
consider the NAEP system when they construct their tests. 
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Tabli 1 



Stenderdlzed Mathsmatlcs Ttsts EHamlned In the Study: 
Test Levtl and Grada Coveraga 



Tast 



Laval 



Grade Coverage 
9 to 11 12 



Caltfomla Achlevamint Tests 

Connprahenslva Tests of 
Basic Skills 

Matropoman Achlavement 
Tests 

SRA Survey of Basic Skills 
Stanrord Achtevement Test 



19 
20 

J 
K 

Advanced 1 
Advanced 2 

36 

37 

Task 1 
Task 2 



H H H 



H 
H 

K 



K 
K 
X 



H 
H 
H 



K 
H 



H 
H 
H 



7 7 



27 
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Table 2 



Numbtr of NAEP Objectlvis ond Subobjectlves AddressBd by 
Standardized Mathsmattcs Achttvtmint Tests - Srede 9 



Numbf p of Numbir of 

Objectives Subobjectivis 



CAT (19) 9 22 

CTBSU) 10 21 

MAT (Advanced 1) 7 21 

SAT (Advanced) 9 24 

SAT (TASK 1) 6 12 

SAT (TASK 2) tO 20 

SRA (36) 10 18 

Chl-Squere Objectives 1 J7, Slgnlflcenct 94 

Subobjectlves 4.90, Significance .56 



98 



27 



Tebla 3 



Numbir of NAEP Objicllvis and SubobJecUvsi Addressed by 
Stendardlied hQthemailcs Achtevement Tests - Grade 10 



Test 


Kumber of 


Number of 




Objectives 


Subobjectlves 


CAT (19) 


9 


22 


CAT (20) 


11 


25 


CTBS (J) 


to 


21 
21 


MAT (Advanced 2) 


9 


SAT (TASK 1) 


6 


12 


SAT (TASK 2) 


10 


20 


SRA (37) 


10 


19 



Ch!-Squere Objectives l.4l,S1ontflcance .96 

Subobjectlves 3.i0, Slgnfflcenci .74 
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Table 4 



Number of NAEP Objictlvis and Subobjictlvis Addrtsstd by Standardized 
Mathamattcs Aehlevament Tests - Grades 10 & 11 



Test 


Number of 


Number of 




Objecttves 


Subobjecttves 


CAT (20) 


It 


25 
21 


CTBS (J) 


to 


CTBS (K) 


10 


20 


MAT (Advanced 2) 


9 


24 
12 


SAT (TASK 1) 


6 


SAT (TASK 2) 


10 


20 


SRA (37) 


to 


19 



Chl-Square Objectives 1.77, Significance .94 

Subobjactlves 4.90, Significance .56 
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Table 5 

NAIP Objectives end Subobjectfvet Addressid by 

CAT -19 



Objective 
Number and Numeralton Concepts 

Arithmetic Computation 

Estimation and Measurement 
Ixponents & Logarithms 
Algebraic Expressions 
Functions 

Probability and Statistics 
Geometry 

Business & Ccnsumer Mathematics 



Subobjective 



Numeration Systems 
Prime & Composite Numbers 
Divisibility, Greatest Common 
Factor, Least Common Multiple 
The Real Number Line 
Whole Numbers 
Rational Numbers 
Ratio, Proportion & Percent 
Rounding Off 
Time 
Money 

EHponential & Logarithmic 
Equations 

Combining Like Terms 
Evaluating Expressions 
Quadratic Equations & 
Their Graphs 

Maxima and Minima of Functions 
Permutations & Combinations 
Outcomes, Samples, Spaces 
and Events 

Probability of an Event 
Measures of Central Tendency 
Circles and Spheres 
Cartesian Coordinates 
Personal and Bank Records 



Table 6 



1 



NAEP Objestlvts ond Subobjectivas Addressed by 

CAT - 20 



Objecttve 



Subobjectlve 



Number end Numeration Concepts 



Arithmetic Computation 



Sets 

Estimation & Measuriment 



Exponents & Logarithms 
Algebraic Expressions 

Equations and Logic 
Functions 

Probability end Statistics 



Geometry 

Business & Consumer Mathematics 



Numeration Systems 

Odd and Even Numbers 

Prime and Composite Numbers 

Real Numbere 

Rational Numbers 

Whole Numbers 

Rational NumberB 

Ratio, Proportion & Percent 

Rounding Off 

Properties 

Time 

Weight 

Area-Volume 

Money 

Exponential Equations 
Combining Ltlee Terms 
Removing Parentheses 
Solving Equations and 

Inequailtlee with 

Absolute Values 
y-lntercept 

Permutations & Combinations 
Outcomes, Samples, Spaces 
and Events 
Measures of Central Tendency 
Measures of Dispereion 
Circles and Spheres 
Pereonol and Bank Records 



Table 7 



NAEP Objectives and Subobjectlves Addressed by 
CTBS - Level J 



Objective 



Number and Numeration Systems 
Arithmetic Computation 



Sets 

Estimation & Measurement 



Algebraic Expressions 



Equations & Logic 
Functions 

Probability and Statistics 



Geometry 
Trigonometry 



Subobjectlve 



Numeration Systems 
Odd and Even Numbers 
Whole Numbers 
Rational Numbers 
Ratio, Proportion, Percent 
Computation v^m 
Approximate Data 
Properties 
Time 
Money 

Conversion Relations 
Properties of Expressslons 
Combining Like Terms 
Removlr.g Parentheses 
Finding Solutions In 
One Variable 
Y Intercept 

Permutations & Combinations 
Probability of an Event 
Descriptive Statistics 
Measures of Central Tendency 
Circles & Spheres 
Relations among Functions 



Table 8 



NAEP Objectives and Subobjectlves Addressed by 
CTBS-Level K 



Objective 

Number and Numeration Concepts 
Arithmetic Computation 

Sets 

Estimation & Measurement 
Exponents & Logarithms 
Algebraic Expressions 

Equations & Logic 
Probability & Statistics 

Geometry 

Business and Consumer Mathematics 



Subobjectlve 



Numeration Systems 

Whole Numbers 

Rational Numbers 

Ratio, Proportion, Percent 

Properties 

Time 

Money 

Exponential & Logarithmic 
Equations 

Properties of Expressions 
Manipulation of Expressions 
Combining Lll<e Terms 
Graphs of Equations 
Maxima & Minima of Functions 
Basic Probability Concepts 
Permuatlons & Combinations 
Outcomes, Samples, Spaces 
and Events 

Measures of Central Tendency 
Measures of Dispersion 
Circles & Spheres 
Personal and Bank Records 



34 



33 



Table 9 



NAEP Objectives and Subobjectlves Addressed by 
MAT - Advanced 1 



Objective 



Subobjectlve 



Number and Numeration Concepts 



Arithmetic Computation 



Estimation and Measurement 



Exponents & Logarithms 
Algebraic Expressions 

Probability & Statistics 

Seometry 



Decimal Place Value 
Prime & Composite Numbers 
Greatest Common Factor 
Least Common Multiple 
Whole Numbers 
Rational Numbers 
Ratio, Proportion & Percent 
Rounding Off 
Time 
Distance 
Area-Volume 
Weight 

Exponential Equations 
Operations with Expressions 
Evaluating Expressions 
Probability of an Event 
Representing Data 
Points, Lines and Planes 
Rays, Segments and Angles 
Polygons and Polynedra 
Angle Measurement 
Cartesian Coordinates 
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Table 10 



NAEP Objectives and Subobjectlves Addressed by 
MAT - Advanced 2 



Objective 



Number and Numeration Concepts 
Arithmetic Computation 



Estimation and Measurement 



Exponents and Logarithms 
Algebraic Expressions 



Functions 



Probability and Statistics 
Georretry 

Trigonometry 



Subobjectlve 



Decimal-Place Value 

Whole Numbers 

Rational Numbers 

Ratio, Proportion and Percent 

Rounding Off 

Time 

Distance 

Area-Volume 

Conversion Relations 

Exponential Equations 

Factoring 

Evaluating Expressions 
Evaluating Functions 
Y-lntercept 

Basic Probability Concepts 
Representing Data 
Rays, Segments and Angles 
Polygons and Polyhtcira 
Circles and Spheres 
Angle Measurement 
Trigonometric Functions 



36 
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Table 1 1 



NAEP Objectives and Subobjectives Addressed by 
SRA Survey of Basic Skills - 36 



Objective 



Number and Numeration Concepts 



Arithmetic Computation 



Estimation and Measurement 
Exponents and Logarithms 
Algebraic Expressions 

Functions 



Probability and Statistics 

Geometry 
Trigonometry 
Miscellaneous Topics 



Subobjective 



Numeration Systems 
Prime and Composite Numbers 
Greatest Common Factor - Least 
Common Multiple 
Whole Numbers 
Rational Numbers 
Ratio, Proportion, Percent 
Rounding Off 
Time 

Exponential Equations 
Combining Like Terms 
Removing Parentheses 
Writing Equations of 
Quadratic Functions 
Maxima and Minima of Functions 
Permutations and Combinations 
Measures of Central Tendency 
Circles and Spheres 
Trlgonomet: Ic Functions 
Sequences and Series 
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Table 12 



NAEP Objectives and Subobjectlves Addressed by 
SRA Survey of Basic Skills - 37 



Objective * Subobjectlve 



Number and Numeration Concepts 



Arithmetic Computation 



Estimation and Measurement 
Exponents and Logarithms 
Algebraic EKpresstons 

Functions 



Probability and Statistics 



?try 

scellaneous Topics 

slness and Consumer Mathematics 



Numeration Systems 

Odd and Even Numbers 

Prime and Composite Numbers 

Whole Numbers 

Rational Numbers 

Ratio, Proportion and Percent 

Rounding Off 

Time 

Exponential Equations 
Combining Like Terms 
Operations with Expressions 
Writing Equations or 
Quadratic Functions 
Maxima and Minima of Functions 
Permutations and Combinations 
Probability of an Event 
Measures of Central Tendency 
Circles and Spheres 
Binomial Expansion 
Personal and Bank Records 
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Table 13 



NAEP Objectives and Subobjectlves Addrtssed by 
Stanford Achievement Test - Advanced 



Objective 



Subobjectlve 



Number and Numeration Concepts 



Arithmetic Computation 



Sets 

Estimation and Measurement 



Exponents and Logarithms 
Algebraic Expressions 
Probability and Statistics 



Logic 

Business and Consumer Mathematics 



Decimal-Place Value 
Prime and Composite Numbers 
Greatest Common Factor - 
Least Common Multiple 
Factorials 
Real Numbers 
Whole Numbers 
Rational Numbers 
Complex Numbers 
Ratio, Proportion, Percent 
Rounding Off 
Set Operations 
Time 
Weight 
Area-Volume 
Conversion Relations 
Exponential Equations 
Evaluating Expressions 
Basic Probability Concepts 
Measures of Central Tendency 
Representing Data 
Circles and Spheres 
Cartesian Coordinates 



Personal and Bank Records 
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Table M 



NAEP Objectives and Subobjectlves Addressed by 
Stanford Achievement Test - TASK t 



O^J^ctive Subobjectlve 



Number and Numeration Concepts 
Arithmetic Computation 

Estimation and Measurement 

Algebraic Expressions 

Probability and Statistics 
Business and Consumer Mathematics 



Numeration Systems 

Integers 

Whole Numbers 

Rational Numbers 

Ratio, Proportion, Percent 

Rounding Off 

Time 

Distance 

Combining Like Terms 
Evaluating Expressions 
Permutations and Combinations 
Personal and Bank Records 
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Table 15 

NAEP Objectives and Subobjectlves Addressed by 
Stanford Achievement Test - TASK 2 



Objective 
Number and Numeration Concepts 
Arithmetic Computation 

Estimation & Measurement 

« 

Exponents and Logarithms 
Algebraic Expressions 
Equations and Logic 

Functions 

Probability and Statistics 
Geometry 

Buslntss and Consumer Mathematics 



Subobjectlve 



Numeration Systems 
Odd and Even Numbers 
Prime & Composite Numbers 
Whole Numbers 
Rational Numbers 
Ratio, Proportion, Percent 
Rounding Off 
Time 
Weight 

Conversion Relations 
Exponential Equations 
Evaluating Expressions 
Solving Equations & inequalities 

with Absolute Values 
Evaluating Functions 
Analysis of Oraphs of 

Quadratic Functions 
Maxima and Minima of Functions 
Probability of an Event 
Measures of Dispersion 
Circles and Spheres 
Personal and Bank Records 
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